Skip to content

fix(connector): Skip floating-point MAP key subfield extraction#27511

Open
officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
officialasishkumar:fix/27507-float-map-subfield-extraction
Open

fix(connector): Skip floating-point MAP key subfield extraction#27511
officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
officialasishkumar:fix/27507-float-map-subfield-extraction

Conversation

@officialasishkumar
Copy link
Copy Markdown

@officialasishkumar officialasishkumar commented Apr 5, 2026

Description

Skip subfield extraction for MAP subscripts whose key type is DOUBLE or REAL, because those keys cannot be represented losslessly as Subfield paths.

Apply the same guard in both Hive's SubfieldExtractor and planner-side PushdownSubfields, including the map_subset and map_filter helpers.

Motivation and Context

Fixes #27507.

Subfield can represent integer and string MAP subscripts, but floating-point MAP keys were being coerced through longValue(), which could incorrectly turn expressions like x[0.99] into x[0] during pruning and pushdown. The safe behavior is to leave those expressions unpushed.

Impact

Correctness fix for Hive and planner subfield pushdown. Queries over MAP columns keyed by DOUBLE or REAL keep working, but those specific key accesses no longer participate in lossy subfield pruning and pushdown.

Test Plan

  • Added TestPushdownSubfields coverage for direct MAP subscripts and map_subset/map_filter.
  • Ran ./mvnw -pl presto-main-base -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestPushdownSubfields test.
  • Ran ./mvnw -pl presto-hive -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestSubfieldExtractor,TestDomainTranslator test.
  • Added TestHiveLogicalPlanner coverage for the pushdown and pruning cases; I did not execute that class locally because the available JDK 25 runtime trips a Hadoop Subject.getSubject incompatibility during Hive query-runner initialization.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Prevent subfield pushdown and pruning for map accesses with floating-point keys and add regression tests across planner and Hive components.

Bug Fixes:

  • Avoid lossy subfield extraction for DOUBLE and REAL map keys in planner-side subfield pushdown logic.
  • Skip subfield extraction for DOUBLE and REAL map keys in Hive's SubfieldExtractor and domain translation.

Tests:

  • Add planner-level TestPushdownSubfields to cover map subscript and map_* function behavior with numeric and floating-point keys.
  • Extend Hive logical planner, subfield extractor, and domain translator tests to verify that floating-point map key predicates are not pushed down or translated.

Floating-point map keys cannot be represented as Subfield paths without lossy coercion to LongSubscript. Keep those expressions out of subfield extraction and pushdown instead of truncating them.

Apply the guard in both Hive's SubfieldExtractor and the planner-side PushdownSubfields logic, including map_subset and map_filter helpers, and add regression coverage for the extractor, domain translation, optimizer extraction, and Hive logical planning.

Fixes prestodb#27507

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 5, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: officialasishkumar / name: Asish Kumar (e977b02)

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 5, 2026

Reviewer's Guide

Implements a correctness fix to subfield pushdown for MAPs keyed by DOUBLE or REAL by teaching both the planner and Hive subfield extraction logic to skip subfield extraction for floating‑point MAP keys, and adds focused unit and integration tests to verify no subfield pruning/pushdown occurs for these keys while preserving existing behavior for integer and varchar keys.

Sequence diagram for Hive SubfieldExtractor with floating-point MAP keys

sequenceDiagram
    actor HiveSplitReader
    participant SubfieldExtractor

    HiveSplitReader->>SubfieldExtractor: extract(expression)
    activate SubfieldExtractor
    SubfieldExtractor->>SubfieldExtractor: toSubfield(expression, functionResolution, expressionOptimizer, connectorSession)
    alt expression is MAP subscript with constant key
        SubfieldExtractor->>SubfieldExtractor: hasFloatingPointMapKey(mapType)
        alt map key type is DOUBLE or REAL
            SubfieldExtractor-->>HiveSplitReader: Optional.empty
            HiveSplitReader->>HiveSplitReader: read full MAP column (no subfield pruning)
        else map key type is integer or varchar
            SubfieldExtractor->>SubfieldExtractor: add Subfield.LongSubscript(key.longValue())
            SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield)
            HiveSplitReader->>HiveSplitReader: push down subfield to storage
        end
    else expression is other supported pattern
        SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield or empty)
        deactivate SubfieldExtractor
        HiveSplitReader->>HiveSplitReader: existing pruning behavior
    end
Loading

Updated class diagram for PushdownSubfields and SubfieldExtractor floating-point MAP key handling

classDiagram
    class PushdownSubfields {
        +PlanOptimizerResult optimize(PlanNode plan, Session session, TypeProvider types, ...) 
        +static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
    }

    class Rewriter {
        +PlanNode rewrite(PlanNode node, Context context, PlanRewriter planRewriter)
        +static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
        -static Optional~List~Subfield~~~ extractSubfieldsFromArray(ConstantExpression constantArray, VariableReferenceExpression mapVariable)
        -static Optional~Subfield~ extractSubfieldsFromSingleValue(ConstantExpression mapKey, VariableReferenceExpression mapVariable)
        -static boolean hasFloatingPointMapKey(Type type)
        -static boolean isFloatingPointType(Type type)
        -static NestedField nestedField(String name)
    }

    class SubfieldExtractor {
        -RowExpression expression
        -FunctionResolution functionResolution
        -ExpressionOptimizer expressionOptimizer
        -ConnectorSession connectorSession
        +Optional~Subfield~ extract(RowExpression expression)
        -static Optional~Subfield~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession)
        -static boolean hasSubscripts(Optional~Subfield~ subfield)
        -static boolean hasFloatingPointMapKey(Type type)
        -static boolean isFloatingPointType(Type type)
    }

    class Type {
    }

    class MapType {
        +Type getKeyType()
        +Type getValueType()
    }

    class Subfield {
        class PathElement
        +List~PathElement~ getPath()
        class LongSubscript
    }

    PushdownSubfields o-- Rewriter
    SubfieldExtractor ..> Subfield
    Rewriter ..> Subfield
    Rewriter ..> MapType
    SubfieldExtractor ..> MapType
    MapType --|> Type
    SubfieldExtractor ..> Type
    Rewriter ..> Type
    Subfield ..> Subfield.PathElement
    Subfield ..> Subfield.LongSubscript
Loading

File-Level Changes

Change Details Files
Guard subfield extraction and pruning logic so MAPs with DOUBLE/REAL keys do not produce Subfield paths.
  • Expose PushdownSubfields.Rewriter.toSubfield via a package-visible static wrapper for direct testing.
  • Teach PushdownSubfields.Rewriter.toSubfield to bail out when encountering map/array subscripts on MAP types whose key type is DOUBLE or REAL instead of coercing to long.
  • Update helper methods that extract subfields from constant arrays and single map keys to return empty when applied to MAPs with floating‑point key types.
  • Factor out hasFloatingPointMapKey and isFloatingPointType helpers to centralize the DOUBLE/REAL checks.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PushdownSubfields.java
Align Hive-side SubfieldExtractor behavior with planner by skipping floating-point MAP key subfield extraction and adding tests.
  • Add hasFloatingPointMapKey and isFloatingPointType helpers mirroring planner-side logic.
  • Short-circuit SubfieldExtractor.toSubfield for MAP subscripts when the underlying MAP key type is DOUBLE or REAL so no LongSubscript is created.
  • Extend TestSubfieldExtractor with REAL/DOUBLE MAP-key cases that must not yield Subfield paths.
  • Extend TestDomainTranslator to ensure predicates over DOUBLE/REAL MAP subscripts do not translate to subfield domains.
presto-hive-common/src/main/java/com/facebook/presto/hive/SubfieldExtractor.java
presto-hive/src/test/java/com/facebook/presto/hive/TestSubfieldExtractor.java
presto-hive/src/test/java/com/facebook/presto/hive/TestDomainTranslator.java
Extend planner and Hive integration tests to cover pushdown/pruning behavior for DOUBLE/REAL MAP keys and map_subset/map_filter.
  • Add TestPushdownSubfields covering map subscript, map_subset, and map_filter extraction, including negative cases for DOUBLE/REAL key maps and a small mapType helper.
  • Extend TestHiveLogicalPlanner’s test tables with DOUBLE/REAL MAP columns and assert that filter pushdown uses predicateColumns without generating domain predicates for floating-point MAP keys.
  • Add map_subset/map_filter scenarios for MAP<double,double> to assert that no subfield pushdown occurs when keys are floating point.
  • Introduce an assertNoPushdownFilterOnSubfields helper that inspects HiveTableLayoutHandle to verify absence of domainPredicate while ensuring remainingPredicate is preserved.
presto-main-base/src/test/java/com/facebook/presto/sql/planner/optimizations/TestPushdownSubfields.java
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveLogicalPlanner.java

Assessment against linked issues

Issue Objective Addressed Explanation
#27507 Change SubfieldExtractor so that floating-point MAP key subscripts (DOUBLE/REAL) are explicitly rejected instead of being silently truncated and used as LongSubscript, ideally with a clear user-visible error message. The PR changes SubfieldExtractor to detect floating-point MAP key types and return Optional.empty() (skipping subfield extraction) instead of constructing a LongSubscript from the numeric index. This avoids truncation but does not raise any error or user-visible rejection; queries continue to run, just without subfield pruning. The behavior is "skip optimization" rather than "reject with a clear error" as requested in the issue.
#27507 Eliminate the unsafe reliance on truncating floating-point MAP key subscripts to longs for subfield pruning/pushdown across the optimizer and Hive reader, by ensuring planner-side subfield extraction and related map_* helpers do not attempt subfield pushdown for floating-point map keys.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The floating-point map-key checks (hasFloatingPointMapKey / isFloatingPointType) are now duplicated in both PushdownSubfields and SubfieldExtractor; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk.
  • The new REAL-key tests use floatToRawIntBits to build REAL constants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The floating-point map-key checks (`hasFloatingPointMapKey` / `isFloatingPointType`) are now duplicated in both `PushdownSubfields` and `SubfieldExtractor`; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk.
- The new REAL-key tests use `floatToRawIntBits` to build `REAL` constants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reject float MAP key subscripts in SubfieldExtractor instead of silently truncating

1 participant